YARN-11011. Make YARN Router throw Exception to client clearly.#6211
YARN-11011. Make YARN Router throw Exception to client clearly.#6211slfan1989 merged 13 commits intoapache:trunkfrom
Conversation
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
@goiri Can you help review this PR? Thank you very much! |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| Throwable cause = e.getCause(); | ||
| LOG.error("Cannot execute {} on {} : {}", request.getMethodName(), | ||
| subClusterId.getId(), cause.getMessage()); | ||
| exceptions.put(subClusterId, e); |
There was a problem hiding this comment.
I think is good to keep the map even though we just output the values.
There was a problem hiding this comment.
Thank you very much for your help in reviewing the code! I will improve this part of the code.
|
@goiri Can you help review this PR again? Thank you very much! |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
@goiri Can you help review this PR again? Thank you very much! |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| if (t != null) { | ||
| LOG.error(errMsg, t); | ||
| throw new YarnException(errMsg, t); | ||
| LOG.error(errMsg + "" + t.getMessage(), t); |
There was a problem hiding this comment.
Thank you very much for your help in reviewing the code! I will improve this part of the code.
| cause = cause.getCause(); | ||
| } | ||
| String errMsg = (cause.getMessage() != null) ? cause.getMessage() : "UNKNOWN"; | ||
| return Pair.of(subClusterId, new YarnException( |
There was a problem hiding this comment.
Extract the exception for readability.
There was a problem hiding this comment.
I will improve this code.
| Pair<SubClusterId, Object> pair = future.get(); | ||
| subClusterId = pair.getKey(); | ||
| Object result = pair.getValue(); | ||
| if(result instanceof YarnException) { |
| import org.apache.hadoop.yarn.api.ApplicationClientProtocol; | ||
| import org.apache.hadoop.yarn.api.protocolrecords.SubmitApplicationRequest; | ||
| import org.apache.hadoop.yarn.api.protocolrecords.SubmitApplicationResponse; | ||
| import org.apache.hadoop.yarn.api.protocolrecords.*; |
|
🎊 +1 overall
This message was automatically generated. |
|
@goiri Can you help review this PR again? Thank you very much! |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
@goiri Thank you very much for your help in reviewing the code! |
…he#6211) Contributed by Shilun Fan. Reviewed-by: Inigo Goiri <inigoiri@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>
Description of PR
JIRA: YARN-11011. Make YARN Router throw Exception to client clearly.
While using YARN Federation, users have provided feedback that the FederationClientInterceptor does not provide sufficient feedback to the client. For instance, it may return messages like "No active SubCluster available to submit the request," or it may not provide clear error messages, requiring users to check Router logs to understand the issue.
Upon careful review of the code, there are two categories of issues:
In cases such as
submitApplicationorgetNewApplication, the lack of feedback is often due to inaccurate user configuration of the maximum retry count. For example, if there are 2 sub-clusters, the configured maximum retry count should be set to 1. If users use the default value of 3, it can lead to exceptions like "No active SubCluster available to submit the request." To address this, I will add an explanation of the maximum retry count configuration in YARN-11594 to help users better configure this parameter.In the case of
getClusterMetrics, which involves merging results from multiple clusters, I have optimized theinvokeConcurrentmethod. Now, when exceptions are thrown, the relevant error information is directly propagated to the client.These optimizations aim to improve the feedback provided to users, making error messages more informative and easier to understand.
How was this patch tested?
Add Junit Test.
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?